Recognition in 2D Images of 3D Objects from Large Model Bases Using Prediction Hierarchies

نویسندگان

J. Brian Burns

Leslie J. Kitchen

چکیده

An object recognition system is presented that it designed to handle the computational complexity posed by a large model base, an unconstrained viewpoint and the structural complexity and detail inherent in a single view. The design is based on two ideas. The first is to compute descriptions of what the objects should look like in the image, called predictions, before the recognition task begins. This reduces actual recognition to a 2D matching process, substantially speeding up recognition time for 3D objects (with manageable storage overhead). The second is to represent all the predictions by a single, combined ISA and PART-OF hierarchy called a prediction hierarchy. The nodes in this hierarchy are partial descriptions that are common to views and hence constitute shared processing subgoals during matching. Many of the problems encountered with large model bases and complex models are reduced by subgoal sharing: projections with similarities explicitly share the representation and recognition of their common aspects. The original contribution of this paper is the automatic compilation, from a 3D model base, of a prediction hierarchy that can be used to recognise objects. A prototype system based on these ideas is demonstrated using a set of polyhedral objects and projections from an unconstrained range of viewpoints. 1 . I n t r o d u c t i o n Object recognition is a central aspect of the process of understanding visual information, helping us to relate what we are seeing to what we have experienced in the past. In spite of much progress in this area, there are crucial problems that have not received adequate attent ion. One problem is that of representing information about 3D objects in a way that makes matching them to 2D image data efficient and reliable. That is, the geometric analysis required to relate an arrangement of 2D image features to the structure and pose of 3D objects should be sufficient for recognition and yet not involve massive amounts of computat ion during the t ime-crit ical recognition task. Another problem is ensuring that the storage and t ime complexity grows only slowly w i th respect to the size of the model base and the complexity of the models. We emphasize efficiency for model-based vision because of the remarkable abil i ty of humans to rapidly recognize a large number objects from a range of viewpoints [Biederman85]. Also, while there are other sources of information that seem to be important , specifically scene context [Biederman85, Weymouth86] and model-independent understanding of 3D structure [Marr 82], these useful cues may quite often be unavailable, unreliable, or merely a first step towards a ful l interpretat ion. Techniques for relating 3D model information to 2D image data can be part i t ioned into two basic approaches: prediction This research was supported in part by the by the Air Force Office of Scientific Research under contract number AFOSR-86-0021. Thanks to Al Hanson, Ed Riseman Michael Boldt and John Dolan for comments, support and help with the figures. cycling and pre-recognition view analysis. In the former, represented by [Brooks81], the system iteratively cycles through a process of prediction!, deciding which projected model structure to search for in the image and what it looks like; observation, searching for image data that match the prediction; and back constraining, using addit ional properties of the matched data to further constrain the possible 3D poses and structural variations in the object. This approach could be computationally inefficient for large model bases since the prediction step would involve computing what is common about the projections of a large and possibly complex class of objects from a range of viewpoints. Similarly, manipulat ion of part ial ly constrained poses during the back-constraining step can be involved [Lowe85]. In the alternative approach, pre-recognition view analysis, all expectations of what to look for in the image are generated before the actual recognition task. Recognition then becomes a 2D matching problem followed by object pose analysis and verificat ion. The characteristic-view based schemes of Chakravarty [82], the property spheres of Fekete [84], the SCERPO system of Lowe [85], the principal views of Cooper [87] and the image-based descriptions of the VISIONS system developed in [Weymouth86] al l roughly follow this method. Addit ional ly, this approach has similarities w i th the photometric stereo interpretation system of Ikeuchi [87]. Another basic idea incorporated into our design is the use of IS-A and PART-OF hierarchical representations ([Marr82], [Brooks8l] , [Mulder85] and [Weymouth86]). We developed a single, combined IS-A and PART-OF hierarchy called a prediction hierarchy. The nodes in this hierarchy are part ial descriptions that are common to views and hence constitute shared processing subgoals during matching. Many of the problems encountered w i th large model bases and complex models are reduced by subgoal sharing: projections w i th similarities explicit ly share the representation and recognition of their common aspects. The original contr ibution of this paper is the automatic compi lat ion, f rom a 3D model base, of a prediction hierarchy that can be used to recognize objects. A prototype system based on these ideas is demonstrated using a set of polyhedral objects and projections from an unconstrained range of viewpoints. A fuller treatment can be found in [Burns87].

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...

متن کامل

ObjectNet3D: A Large Scale Database for 3D Object Recognition

We contribute a large scale database for 3D object recognition, named ObjectNet3D, that consists of 100 categories, 90,127 images, 201,888 objects in these images and 44,147 3D shapes. Objects in the 2D images in our database are aligned with the 3D shapes, and the alignment provides both accurate 3D pose annotation and the closest 3D shape annotation for each 2D object. Consequently, our datab...

متن کامل

A New Approach for Quantitative Evaluation of Reconstruction Algorithms in SPECT

ABTRACT Background: In nuclear medicine, phantoms are mainly used to evaluate the overall performance of the imaging systems and practically there is no phantom exclusively designed for the evaluation of the software performance. In this study the Hoffman brain phantom was used for quantitative evaluation of reconstruction techniques. The phantom is modified to acquire t...

متن کامل

Hybridization of Facial Features and Use of Multi Modal Information for 3D Face Recognition

Despite of achieving good performance in controlled environment, the conventional 3D face recognition systems still encounter problems in handling the large variations in lighting conditions, facial expression and head pose The humans use the hybrid approach to recognize faces and therefore in this proposed method the human face recognition ability is incorporated by combining global and local ...

متن کامل

Gypsum Dissolution Effects on the Performance of a Large Dam (TECHNICAL NOTE)

Upper Gotvand dam is constructed on the Karun River located in the south west of Iran. In this paper, 2D and 3D models of the dam together with the foundation and abutments were constructed and several seepage analyses were carried out. Then the gypsum veins scattered throughout the foundation ground and also the seepage pattern were included in the models, hence the dissolution law of gypsum, ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 1987

Recognition in 2D Images of 3D Objects from Large Model Bases Using Prediction Hierarchies

نویسندگان

چکیده

منابع مشابه

Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study

ObjectNet3D: A Large Scale Database for 3D Object Recognition

A New Approach for Quantitative Evaluation of Reconstruction Algorithms in SPECT

Hybridization of Facial Features and Use of Multi Modal Information for 3D Face Recognition

Gypsum Dissolution Effects on the Performance of a Large Dam (TECHNICAL NOTE)

عنوان ژورنال:

اشتراک گذاری